Second - level Instruction Cache Thread Processing Unit Thread Processing Unit Thread Processing Unit Instruction Cache First - level First - level First - level Instruction Cache Instruction Cache Execution

نویسندگان

H. Kazi

David J. Lilja

چکیده

This paper presents a new parallelization model, called coarse-grained thread pipelining, for exploiting speculative coarse-grained parallelism from general-purpose application programs in shared-memory multiprocessor systems. This parallelization model, which is based on the ne-grained thread pipelining model proposed for the superthreaded architecture 11, 12], allows concurrent execution of loop iterations in a pipelined fashion with run-time data-dependence checking and control speculation. The speculative execution combined with the run-time dependence checking allows the parallelization of a variety of program constructs that cannot be parallelized with existing run-time parallelization algorithms. The pipelined execution of loop iterations in this new technique results in lower parallelization overhead than in other existing techniques. We evaluated the performance of this new model using some real applications and a synthetic benchmark. These experiments show that programs with a suuciently large grain size compared to the parallelization overhead obtain signiicant speedup using this model. The results from the synthetic benchmark provide a means for estimating the performance that can be obtained from application programs that will be parallelized with this model. The library routines developed for this thread pipelining model are also useful for evaluating the correctness of the codes generated by the superthreaded compiler and in debugging and verifying the simulator for the superthreaded processor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Executing Mispredicted Load Instructions in a Speculative Multithreaded Architecture

Concurrent multithreaded architectures exploit both instructionlevel and thread-level parallelism in application programs. A single-threaded sequencing mechanism needs speculative execution beyond conditional branches in order to exploit more instruction-level parallelism. In addition, an aggressive multithreaded architecture should also use thread-level control speculation in order to exploit ...

متن کامل

Prefetch Threads for Database Operations on a Simultaneous Multi-threaded Processor

Simultaneous Multi-threading (SMT) has been developed to increase instruction level parallelism by allowing instructions from a different thread to run during a stall. Inter-thread cache interference, however, might limit the benefit of running multiple independent threads. SMT processors can be utilized in a different model, where a helper thread is used to prefetch cache blocks for the main e...

متن کامل

An Instruction Cache Architecture for Parallel Execution of Java Threads

Designing a Java processor supporting horizontal multithreading has been becoming more attractive as network computing gains importance. Different from the traditional superscalar processors that issue multiple instructions from a single instruction stream to exploit the instruction level parallelism (ILP), the horizontal multithreading Java processors issue multiple instructions (bytecodes) fr...

متن کامل

Optimising long-latency-load-aware fetch policies for SMT processors

Simultaneous Multithreading (SMT) processors fetch instructions from several threads and, in this way, the available Instruction Level Parallelism (ILP) of each thread is exposed to the processor. In an SMT processor the fetch engine has the additional level of freedom, compared to a super-scalar processor, to select independent instructions. The fetch engine determines how shared resources are...

متن کامل

Clustering Cores for Parallel Thread Execution

In recent years, we have observed a strong trend towards using accelerators, such as GPUs, to speed up scientific applications. This results in a complex heterogeneous system in which traditional CPUs are used for the execution of sequential threads, while GPUs are used for accelerating parallel threads. Instead of following this trend, this paper introduces a new explicitly parallel instructio...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Second - level Instruction Cache Thread Processing Unit Thread Processing Unit Thread Processing Unit Instruction Cache First - level First - level First - level Instruction Cache Instruction Cache Execution

نویسندگان

چکیده

منابع مشابه

The Effect of Executing Mispredicted Load Instructions in a Speculative Multithreaded Architecture

Prefetch Threads for Database Operations on a Simultaneous Multi-threaded Processor

An Instruction Cache Architecture for Parallel Execution of Java Threads

Optimising long-latency-load-aware fetch policies for SMT processors

Clustering Cores for Parallel Thread Execution

عنوان ژورنال:

اشتراک گذاری